-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add helper ops to support cache conflict misses #2571
Conversation
This pull request was exported from Phabricator. Differential Revision: D55926421 |
❌ Deploy Preview for pytorch-fbgemm-docs failed.
|
Summary: This diff adds helper operators for the cache conflict miss support enablement in SSD TBE. Changes include: - Extend `get_unique_indices_cuda` to compute and return inverse linear indices (the tensor that contains the original positions of lienar indices before sorting) - Extend `lru_cache_find_uncached_cuda` to compute and return the inverse cache sets (the tensor that contains the original positions of cache sets of unique indices before sorting) - Update SSD backend to support cache conflict misses instead of failing. The rows that experience conflict misses will be stored in a scratch pad for TBE kernels to consume. They will be evicted to SSD once the backward+optimizer step of TBE is completed. - Add `ssd_generate_row_addrs` for generating row addresses of data that is fetched from SSD (data can be in either a scratch pad or LXU cache). Differential Revision: D55926421
This pull request was exported from Phabricator. Differential Revision: D55926421 |
Summary: This diff adds helper operators for the cache conflict miss support enablement in SSD TBE. Changes include: - Extend `get_unique_indices_cuda` to compute and return inverse linear indices (the tensor that contains the original positions of lienar indices before sorting) - Extend `lru_cache_find_uncached_cuda` to compute and return the inverse cache sets (the tensor that contains the original positions of cache sets of unique indices before sorting) - Update SSD backend to support cache conflict misses instead of failing. The rows that experience conflict misses will be stored in a scratch pad for TBE kernels to consume. They will be evicted to SSD once the backward+optimizer step of TBE is completed. - Add `ssd_generate_row_addrs` for generating row addresses of data that is fetched from SSD (data can be in either a scratch pad or LXU cache). Reviewed By: q10 Differential Revision: D55926421
This pull request was exported from Phabricator. Differential Revision: D55926421 |
Summary: This diff adds helper operators for the cache conflict miss support enablement in SSD TBE. Changes include: - Extend `get_unique_indices_cuda` to compute and return inverse linear indices (the tensor that contains the original positions of lienar indices before sorting) - Extend `lru_cache_find_uncached_cuda` to compute and return the inverse cache sets (the tensor that contains the original positions of cache sets of unique indices before sorting) - Update SSD backend to support cache conflict misses instead of failing. The rows that experience conflict misses will be stored in a scratch pad for TBE kernels to consume. They will be evicted to SSD once the backward+optimizer step of TBE is completed. - Add `ssd_generate_row_addrs` for generating row addresses of data that is fetched from SSD (data can be in either a scratch pad or LXU cache). Reviewed By: q10 Differential Revision: D55926421
This pull request was exported from Phabricator. Differential Revision: D55926421 |
Summary: This diff adds helper operators for the cache conflict miss support enablement in SSD TBE. Changes include: - Extend `get_unique_indices_cuda` to compute and return inverse linear indices (the tensor that contains the original positions of lienar indices before sorting) - Extend `lru_cache_find_uncached_cuda` to compute and return the inverse cache sets (the tensor that contains the original positions of cache sets of unique indices before sorting) - Update SSD backend to support cache conflict misses instead of failing. The rows that experience conflict misses will be stored in a scratch pad for TBE kernels to consume. They will be evicted to SSD once the backward+optimizer step of TBE is completed. - Add `ssd_generate_row_addrs` for generating row addresses of data that is fetched from SSD (data can be in either a scratch pad or LXU cache). Reviewed By: q10 Differential Revision: D55926421
This pull request was exported from Phabricator. Differential Revision: D55926421 |
This pull request has been merged in 56d21a0. |
Summary:
This diff adds helper operators for the cache conflict miss support
enablement in SSD TBE. Changes include:
get_unique_indices_cuda
to compute and return inverselinear indices (the tensor that contains the original positions of
lienar indices before sorting)
lru_cache_find_uncached_cuda
to compute and return theinverse cache sets (the tensor that contains the original positions
of cache sets of unique indices before sorting)
failing. The rows that experience conflict misses will be stored in
a scratch pad for TBE kernels to consume. They will be evicted to
SSD once the backward+optimizer step of TBE is completed.
ssd_generate_row_addrs
for generating row addresses of datathat is fetched from SSD (data can be in either a scratch pad or LXU
cache).
Differential Revision: D55926421